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AMENDMENTS TO THE CLAIMS: 

1 . (Currently amended) Aii apparatus for encoding a DNA sequenc e to achieve a high data 
compression ratio fo r storage or transfer, which comprises: 

a comparative unit for ahgning a reference sequence having known DNA information with a 
subject sequence to be encoded compressed and extracting a difference between the reference 
sequence and the subject sequence; 

a conversion unit for converting information of the extracted difference between the reference 
sequence and the subject sequence into a string of pr - ed e termiB 6 d rcharacter s and for outputting the 
string: of characters ; 

a code storage unit for storing predet e rm i ned -a^conversion eedes-code that corresponds to iie 
individual - eharaeters a character to represent the extracted difference : and 

an encoding unit for encoding the individual ehafaet e rs that make the string of the characters 
using the conversion ©odssscode . 

% (Currently amended) The apparatus of claim 1, wherein the characters to represent the 
extracted difference comprises comprise 

a fest-character representing each DNA base symbols , 

a ^eeei^d- numeric character representing th©~a.number of base positions that characterize a 

feature of the extracted difference, 

a feifd-character representing the-starting aadror ending of the extracted difference, and 
a fourth character representing een feiuation of the w hether a type of extracted difference 

occurs in succession in the subject sequence . 

3. (Currently amended) The apparatus of claim 1, wherein th e -c onver s ieH-unit - eonv e jte 
r espective infeHnatieni - e f feature of the extracted difference converted into the string of characters 
comprise 

starting of the extracted difference, 

l^start position of the extracted difference, 

e o nti H uati - o n whether a type of extracted difference occurs in succession in the subject 
sequence , the 

a,number of continued bases in the extracted difference , 
a bases base which the extracted difference comprises , 
endin g of the extracted difference , and 
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a distance between the start position and the end position of the extracted d ifference4ete4he 



t h e- ted - €teaet^TTO4 tlie - ^^ and outputs the string of tho^hajFaet e r s. 

4. (Currently amended) The apparatus of claim 1, wherein a type of the extracted difference 
comprises 

a^start region mismatch between the reference sequence and the subject sequence;? 
a^blank representing there is no base in a base position in the subject sequence b y-base 



a^single base pair mismatch between the reference sequence and the subject sequence;^ 
a base insertion into the subject sequence;? 

a multiple base pair mismatch between the reference sequence and the subject sequence;? asd 

SL 

an end region mismatch between the reference sequence and the subject reference, 

5, (Currently amended) The apparatus of claim 1, wherein the conversion codes are 4 bit 
code s, each of which corresponds to each of th e character s. 

6. (Currently amended) The apparatus of claim 1, which further comprises 

a division unit for_dividing the extracted difference into segments of a_predetermined 
sfeessize, and 

wherein the conversion unit converts information of t he extracted difference into the string 
which is made up of the characters to represent the extracted difference b ased on the segments. 

7, (Currently amended) The apparatus of claim 1, which further comprises: 
a compression unit for compressing the encoded subject sequence; and 

a sequence storage unit for storing the compressed subject sequence, 

8. (Currently amended) The apparahas of claim 1, which further comprises 

a pre-processing unit for modifiing the reference sequence using creating a variation 
sequence generation factor jf^ ^reated bv a variation sequence generation function that uses 
random variables as inputs -aBd-medtfying the reference sequence using the cre ate d variation 
sequence g e neration factor . 





orresponding to the reference sequence;? 
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9. (Currently amended) The apparatus of claim 8, wherein the variation sequence induction 
generation factor comprises tfee~a,total number of variations, a^distance between fee -two adjacent 
variations, a^length of fee-a variation s variation ^ a,type of the variations v ariation , and a variation 
sequenc e of the variation , 

10. (Withdrawn) A method for encoding a DNA sequence, which comprises: 

aligning a reference sequence having known DNA information with a subject sequence to be 
encoded; 

extracting a difference between the reference sequence and the subject sequence; 

converting information of the extracted difference between the reference sequence and the 
subject sequence into a string of predetermined characters; and 

encoding the individual characters that make the string of the predetermined characters using 
predetermined conversion codes that correspond to the individual characters. 

1 1 . (Withdrawn) The method of claim 1 0, wherein the characters comprises a first character 
representing DNA base symbols, a second character representing the number of the difference, a 
third character representing the starting and ending of the difference, and a fourth character 
representing continuation of the difference. 

12. (Withdrawn) The method of claim 1 1, wherein converting comprises: 
allotting the third character for the starting of the difference; 

allotting the second character for the starting position of the difference; 

allotting the fourth character for the continuation of the difference; 

allotting the second character for the number of the continued bases of the difference; 

allotting the first character for the bases of the difference; 

allotting the third character for the ending of the difference; 

allotting the second character for the distance between the start position and the end position 
of the difference; and 

outputting the string of the allotted characters. 

13. (Withdrawn) The method of claim 10, wherein the difference comprises start region 
mismatch between the reference sequence and the subject sequence, blank by base deletion of tlie 
subject sequence corresponding to the reference sequence, single base pair mismatch between the 
reference sequence and the subject sequence, base insertion into the subject sequence, multiple base 
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pair mismatch between the reference sequence and the subject sequence, and end region mismatch 
between the reference sequence and the subject reference. 

14. (Withdrawn) The method of claim 1 0, wherein the conversion codes are 4 bit codes, each 
of which corresponds to each of the characters. 

15. (Withdrawn) The method of claim 10, which further comprises dividing the extracted 
difference into segments of predetermined sizes, and 

wherein in converting, information of the extracted d ifference is converted into the string of 
the characters based on the segments. 

16. (Withdrawn) The method of claim 10, which further comprise: 
compressing the encoded subject sequence; and 

storing the compressed subject sequence, 

1 7. (Withdrawn) The method of claim 10, which further comprises, before aligning, creating 
a variation sequence induction factor from a variation sequence induction function that uses random 
variables as inputs and modifying the reference sequence using the created variation sequence 
induction factor. 

18. (Withdrawn) The method of claim 17, wherein the variation sequence induction factor 
comprises the total number of variations, distance between the variations, length of the variations, 
type of the variations, and a variation sequence. 

1 9. (Currently amended) A computer readable medium having embodied thereon a computer 
program for a method for encoding a DNA sequenc e to achieve a high data compression ratio , the 
method comprising: 

aligning a reference sequence having known DNA information with a subject sequence to be 
encoded; 

extracting a difference between the reference sequence and the subject sequence; 

converting information of the extracted difference between the reference sequence and the 
subject sequence into a string of predetermined characters; and 

encoding the individual characterG that molce t he string of ^ha-characters using pr e determined 
a.conversion eedes- code t hat corresponds to fee-a .individual characters^ 

wherein the computer readable medium is not a carrier wave . 
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